Efficient Text and Semi-structured Data Mining: Knowledge Discovery in the Cyberspace

نویسنده

  • Hiroki Arimura
چکیده

This paper describes applications of the optimized pattern discovery framework to text and Web mining. In particular, we introduce a class of simple combinatorial patterns over texts such as proximity phrase association patterns and ordered and unordered tree patterns modeling unstructured texts and semi-structured data on the Web. Then, we consider the problem of finding the patterns that optimize a given statistical measure within the whole class of patterns in a large collection of unstructured texts. For these classes of patterns, we develop fast and robust text mining algorithms based on techniques in computational geometry, string matching, and combinatorial optimization. We successfully apply the developed text and semi-structured mining algorithms to the experiments on interactive document browsing in a large text database, keyword and common structure discovery from Web.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Topic Modeling and Classification of Cyberspace Papers Using Text Mining

The global cyberspace networks provide individuals with platforms to can interact, exchange ideas, share information, provide social support, conduct business, create artistic media, play games, engage in political discussions, and many more. The term cyberspace has become a conventional means to describe anything associated with the Internet and the diverse Internet culture. In fact, cyberspac...

متن کامل

A Web Text Mining Flexible Architecture

Text Mining is an important step of Knowledge Discovery process. It is used to extract hidden information from notstructured o semi-structured data. This aspect is fundamental because much of the Web information is semi-structured due to the nested structure of HTML code, much of the Web information is linked, much of the Web information is redundant. Web Text Mining helps whole knowledge minin...

متن کامل

A Study of Text Mining Methods, Applications,and Techniques

Data mining is used to extract useful information from the large amount of data. It is used to implement and solve different types of research problems. The research related areas in data mining are text mining, web mining, image mining, sequential pattern mining, spatial mining, medical mining, multimedia mining, structure mining and graph mining. Text mining also referred to text of data mini...

متن کامل

A Detailed Study on Text Mining Techniques

Text Mining is an important step of Knowledge Discovery process. It is used to extract hidden information from not-structured or semi-structured data. This aspect is fundamental because most of the Web information is semistructured due to the nested structure of HTML code, is linked and is redundant. Web Text Mining helps whole knowledge mining process in mining, extraction and integration of u...

متن کامل

Utilizing ARM Technique in Mining Textual Data

Text mining, as one major school in Knowledge Discovery in Data (KDD), mines hidden patterns, rules, regularities and trends from textual data / non-database-data (i.e., text files, web documents, etc.). It is quite different from data mining (another well-known major school in KDD): the data structure of texts, dealt by text mining, is considered implicit, whereas traditional database-data, de...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003